This repository has been archived by the owner on Oct 19, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 360
[RFC][FEATURE] support manual parallelization strategy in shard parallel #816
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ZYHowell
changed the title
[RFC][WIP][FEATURE] support manual config of shard parallel
[RFC][WIP][FEATURE] support manual parallelization strategy in shard parallel
Dec 18, 2022
ZYHowell
changed the title
[RFC][WIP][FEATURE] support manual parallelization strategy in shard parallel
[RFC][FEATURE] support manual parallelization strategy in shard parallel
Dec 24, 2022
merrymercy
approved these changes
Dec 25, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
This PR provides an option to manually config the strategy of
ShardParallel
. It adopts the input the same aspjit
, but replaces itsMesh
by Alpa'sLogicalMesh
, which does not need physical resources. The api is supposed to have:With the sharding spec of each var, we can then use
hlo_module.set_spmd_parameters_shardings
orset_spmd_output_sharding
to set the sharding spec of the module, and let the spmd partitioner to infer the rest.In addition, some partition strategy is defined by
with_sharding_constraint
in the model execution. To support it, we need another api:Where we need to monkey-patch the
with_sharding_constraint
because for pipeshard parallel, the sharding spec cannot be determined yet there.TODO
ShardParallel
;ShardParallel
, then usetensorflow/compiler/xla/hlo/experimental/auto_sharding
to solve under constraints;PipeshardParallel
, where the mesh shape is also manually specified;PipeshardParallel
, where the mesh shape is determined by automatic stage construction;with_sharding_constraint
inShardParallel
;with_sharding_constraint
inPipeshardParallel
.